Estimating the probability for a protein to have a new fold: A statistical computational model.
نویسندگان
چکیده
Structural genomics aims to solve a large number of protein structures that represent the protein space. Currently an exhaustive solution for all structures seems prohibitively expensive, so the challenge is to define a relatively small set of proteins with new, currently unknown folds. This paper presents a method that assigns each protein with a probability of having an unsolved fold. The method makes extensive use of protomap, a sequence-based classification, and scop, a structure-based classification. According to protomap, the protein space encodes the relationship among proteins as a graph whose vertices correspond to 13,354 clusters of proteins. A representative fold for a cluster with at least one solved protein is determined after superposition of all scop (release 1.37) folds onto protomap clusters. Distances within the protomap graph are computed from each representative fold to the neighboring folds. The distribution of these distances is used to create a statistical model for distances among those folds that are already known and those that have yet to be discovered. The distribution of distances for solved/unsolved proteins is significantly different. This difference makes it possible to use Bayes' rule to derive a statistical estimate that any protein has a yet undetermined fold. Proteins that score the highest probability to represent a new fold constitute the target list for structural determination. Our predicted probabilities for unsolved proteins correlate very well with the proportion of new folds among recently solved structures (new scop 1.39 records) that are disjoint from our original training set.
منابع مشابه
A hybrid model for estimating the probability of default of corporate customers
Credit risk estimation is a key determinant for the success of financial institutions. The aim of this paper is presenting a new hybrid model for estimating the probability of default of corporate customers in a commercial bank. This hybrid model is developed as a combination of Logit model and Neural Network to benefit from the advantages of both linear and non-linear models. For model verific...
متن کاملSeasonal Autoregressive Models for Estimating the Probability of Frost in Rafsanjan
This work develops a statistical model to assess the frost risk in Rafsanjan, one of the largest pistachio production regions in the world. These models can be used to estimate the probability that a frost happens in a given time-period during the year; a frost happens after 10 warm days in the growing season. These probability estimates then can be used for: (1) assessing the agroclimate risk ...
متن کاملA generalization of Profile Hidden Markov Model (PHMM) using one-by-one dependency between sequences
The Profile Hidden Markov Model (PHMM) can be poor at capturing dependency between observations because of the statistical assumptions it makes. To overcome this limitation, the dependency between residues in a multiple sequence alignment (MSA) which is the representative of a PHMM can be combined with the PHMM. Based on the fact that sequences appearing in the final MSA are written based on th...
متن کاملSafety Analysis of the Patch Load Resistance of Plate Girders: Influence of Model Error and Variability
This study aims to undertake a statistical study to evaluate the accuracy of nine models that have been previously proposed for estimating the ultimate resistance of plate girders subjected to patch loading. For each model, mean errors and standard errors, as well as the probability of underestimating or overestimating patch load resistance, are estimated and the resultant values are compared o...
متن کاملEstimating Reliability in Mobile ad-hoc Networks Based on Monte Carlo Simulation (TECHNICAL NOTE)
Each system has its own definition of reliability. Reliability in mobile ad-hoc networks (MANET) could be interpreted as, the probability of reaching a message from a source node to destination, successfully. The variability and volatility of the MANET configuration makes typical reliability methods (e.g. reliability block diagram) inappropriate. It is because, no single structure or configurat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Proceedings of the National Academy of Sciences of the United States of America
دوره 97 10 شماره
صفحات -
تاریخ انتشار 2000